Experiments on Spam Detection with Boosting, Svm and Naive Bayes
نویسنده
چکیده
For this project, I implement 3 popular text classification algorithms on spam detection, namely AdaBoost, Support Vector Machines and Naive Bayes. The performance are evaluated on some testing datasets. All experiments are done in Matlab. The experimental result is, all 3 algorithms have a satisfactory performance on spam detection. In term of accuracy, Adaboost has the best error bound. On the other hand Naive Bayes and SVM are superior on training and classification speed. In term of overall performance, Naive Bayes is an ideal algorithm for spam detection.
منابع مشابه
Sentiment Based Twitter Spam Detection
Spams are becoming a serious threat for the users of online social networks especially for the ones like of twitter. twitter’s structural features make it more volatile to spam attacks. In this paper, we propose a spam detection approach for twitter based on sentimental features. We perform our experiments on a data collection of 29K tweets with 1K tweets for 29 trending topics of 2012 on twitt...
متن کاملEnsemble of SVM Classifiers for Spam Filtering
Unsolicited commercial email also known as Spam is becoming a serious problem for Internet users and providers (Fawcett, 2003). Several researchers have applied machine learning techniques in order to improve the detection of spam messages. Naive Bayes models are the most popular (Androutsopoulos, 2000) but other authors have applied Support Vector Machines (SVM) (Drucker, 1999), boosting and d...
متن کاملSurvey on Text Classification (Spam) Using Machine Learning
E-mail spam is a very serious problem in today’s life. It has many conséquences like it causes lower productivity, occupy space in mail boxes, extend viruses, Trojans, and materials containing potentially harmful information for a certain category of users, Destroy stability of mail servers, and as a result users spend a lot of time for sorting incoming mail and deleting undesirable corresponde...
متن کاملBoosting Trees for Anti-Spam Email Filtering
This paper describes a set of comparative experiments for the problem of automatically filtering unwanted electronic mail messages. Several variants of the AdaBoost algorithm with confidence– rated predictions (Schapire & Singer 99) have been applied, which differ in the complexity of the base learners considered. Two main conclusions can be drawn from our experiments: a) The boosting–based met...
متن کاملA new feature selection algorithm based on binomial hypothesis testing for spam filtering
Content-based spam filtering is a binary text categorization problem. To improve the performance of the spam filtering, feature selection, as an important and indispensable means of text categorization, also plays an important role in spam filtering. We proposed a new method, named Bi-Test, which utilizes binomial hypothesis testing to estimate whether the probability of a feature belonging to ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008